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Abstract 

Outlying observations are commonly encountered in the analysis of 
time series. In this paper the problem of detecting additive outliers in 
integer-valued time series is considered. We show how Gibbs sampling 
can be used to detect outlying observations in INAR(l) processes. The 
methodology proposed is illustrated using examples as well as an observed 
data set. 
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1 Introduction 



This work considers a Bayesian approach to the problem of modelling a Poisson 
integer valued autoregressive time series contaminated with additive outliers. 

It is well known that unusual observations and intervention effects often 
occur in data sets and can have adverse effects on model identification and 
parameter estimation. In the framework of Gaussian linear time series the 
problem of detecting and estimating outliers and other intervention effects has 



been investigated by several authors including Fox (1972|, Tsay (19861, Chang 



et al. (19881, Chen and Liu (19931 and Justel et al. (20011, among others. 



However, the problem of modelling outliers and other intervention effects in 
the context of time series of counts has, as yet, received little attention in the 
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literature albeit its relevance for inference and diagnostics. Moreover, in this 
context additional motivation stems from the fact that the usual techniques for 
outlier removal are not adequate since often lead to non integer values. In the 
framework of count time series it is worth mentioning the work of |Fokianos and| 
Fried ( 2010 1 who investigate the problem of modelling intervention effects in 



INGARCH models andlBarczy et al. (2010 2011) who consider CLS estimation 



of the parameters of an INAR(l) model contaminated, at known time periods, 
with innovational and additive outliers, respectively. 

The well-known assertion of George Box that while all models are wrong 
some are useful, motivates that we approach the issue of modelling outliers in 
integer-valued time series focusing on the integer valued autoregressive model 
of order 1. In fact, this model introduced independently by |Al-Osh and Alzaid| 
( 1987 1 and McKenzie ( 1985 1 to model time series of counts, has been extensively 



studied in the literature and applied to many real-world problems including 



statistical process control, (Weifi 20071 because of its simplicity and easiness of 
interpretation. 

To motivate our approach, we represent in figure[T]a data set studied by |Weife| 



(20071 concerning the number of different IP addresses (approximately equiv- 



alent to the number of different users) accessing the server of the pages of the 
Department of Statistics of the University of Wiirzburg in two-minute periods 
from 10 am to 6 pm on the 29th November 2005, in a total of 241 observations. 
This time series is constructed from log data concerning accesses to pages on 



the server. Weifi (2007) models the data with a Poisson INAR(l) model and 
using statistical process control techniques finds an outlying observation at time 
t = 224. As described by that author a detailed analysis of the original log data 
showed that all the eight accesses at that time came from the AOL browser that 
is known to supply permanently new adresses within a small area. Therefore, it 
is not possible anymore to infer the user from the IP address. It is interesting 
to investigate if this observation can be explained by a simple INAR(l) model 
and if the fit can be improved by the inclusion of an additive outlier effect. 
Let {Xt} be a Poisson INAR(l) process satisfying 



X, 



a o Xt_ 



Xt 



et. 



(1) 



with {^k,j) a sequence of Bernoulli r.v. with mean a G [0,1] and {et}, 
the arrival process, a sequence of i.i.d. Poisson variables et ^ 7'o(A). When 
additive outliers (AO) occur at times ri, . . . ,rfc, with integer sizes wi, . . . 
Xt is unobservable and it is assumed that the observed series {Yi} satisfies 

Yt — Xt + X]i"=l ^t.Ti^i, 

where /c € N is the number of outliers and It,s is an indicator function taking 
the value 1 if < = s and otherwise. Roughly speaking an additive outlier can 
be interpreted as a measurement error or as an impulse due to some unspecified 
exogenous source at time t^, i — I, . . . ,k. 
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Figure 1: Number of different IP adresses accessing the server of the pages of 
the Department of Statistics of the University of Wiirzburg between 10 am and 
6 pm on 29 November 2005 



Here we consider a Bayesian approach to the problem of Poisson INAR(l) 
model specification in the presence of additive outliers. Gibbs sampling provides 
estimates for the probability of outlier occurrence at each time point leading 
to an effective outlier detection and accurate parameter estimation. Bayesian 
approaches have been used to detect outliers in ARMA models by |Justel et al.| 
( |2001[ ) and in bihnear models by |Chen| |l997[ ). 



The paper is organized as follows. Section [2] describes the setup of additive 
outliers in INAR(l) models and explains the procedure for outlier detection. 
Section |3] illustrates the methodology on several sets of simulated data as well 
as on a data set concerning the number of different IP addresses accessing the 
server of the pages of the Department of Statistics of the University of Wiirzburg. 
Section |4] concludes the paper. 

2 INAR(l) models with additive outliers 

Assume that the observed time series Yi , . . . , K„ is generated by 
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Yt^Xt + VtSt, l<t<n (2) 

where Xt is a Poisson INAR(l) process satisfying Q, Si, . . . ,6n are independent 
and identically distributed Bernoulli variables with P{St = 1) = e and rji, . . . ,rjn 
are independent random variables identically distributed as Po{l3). Also, 5t and 
r]t are independent for all t. This means that if i5t = 1 the observation Yt 
is contaminated with an AO of magnitude rjf Note that an outlier at time t 
affects the model only at instants t and t + 1. 



2.1 Estimation procedure 

In this section we describe the Bayesian approach via Gibbs sampling to estimate 
model ([2]). Assume that Yi = Xi, that is. there is no outlier in the first obser- 
vation and let Y = (Yi, . . . = (a, A), 8 = {5i, . . . ,(5„), r) = (rji, . . ■,T]n). 
Now we need to derive the conditional posterior distributions of 0, ^, r; and e. 
Conditioning on the first observation the likelihood of Y is given by 



n Mt 



t=2 j:=o 



(Xt^iy. 



(3) 



with Xt^Yt- i]tSt and Mt = min {Xt-i,Xt), i = 2, . . . , n. 

The prior distribution for the contamination parameter e is e Be(/i,(7), 
with expectation E{e) = h/{h + g). Regarding the INAR(l) parameters a and 
A we choose for prior distributions the conjugate of Binomial and Poisson, re- 



spectively and thus a ~ Be(a, 6), A ~ Ga(c, d) (Silva et al. 20051. The set of 
hyperparameters a, b, c, d, /3, h, g are assumed known. 

Let 7r(0, 5, rj, e) denote the prior distribution for (0, S, rj, e). Then 



7r(0, S, T7, e) cx e--*^ A^-^ a"-^ (1 - af-^ e''-^ (1 - e^-^ [| e^ 



t=2 



(4) 



The posterior distribution of 0, ^, ?7 and e is then given by 



7r(0,5,T7,e|y) cx L(0, ^, r/, e) 7r(0, <5, 77, e) 



(X 



-ldX+n0] ^c-1 ^-^ „ 



6-1 h-1 



(1-e) 



9-1 



(5) 



with 0<a<l, A>0, 0<e<l, and rjt ^ 0,1, t = 2,3, ... ,n. 
The complexity of the posterior marginals of S and rj suggest resorting to 
MCMC methods to implement the Bayesian approach described above. 



The full conditional posterior distributions for a and A are given by (Silva 
eralll2005| 
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n Mt 

7r(a|Y, A, S, rj, e) cx a''-^ (1 - a)''-^ [| ^ T{t, ^) a' (1 - a)-^'-^-' (6) 

t=2 1=0 

withT(^,^)=(^Cf- 
and 

n Mt 

7r(A|Y„, a, r,, e, 5) oc A=-ie-('^+")^ J] E '^(*' 

t=2 i=0 

with C/(t,i) (x^^iyrCf*"' a'(l - a)^'-i"', respectively. 

Now, with respect to the full conditional distribution of 5 we reason as 
follows. For each j — 2,...,n, (5j|(Y, a, A, rj, e, ^ Ber{pj), where 

denotes the vector S with the jth component deleted. Accordingly, we can write 



D/A 11V \ X \ = l,Y|a,A,T7,e, 

- = l|Y,«,A,r,,e, V.)) = /(y^ A, r;, 

But 

/(Y|a, A, ry, e, ,5(_,)) =P(^, = 1| . . .) f{Y\S, = 1, a, A, q, e, ^(_,)) 

+ P((5, -0|...) /(Y|5, -0,a,A,»7,e,<5(-,)) 

with P{5j = 1| ...) P{5j ^ l|a,A,?7j,e) = e. 
Therefore 



e/(Y|5, = l,a,A,r7,e,^(_,)) + (l-e)/(Y|J, =0,«,A,r,,e,^(_,)) ^ ^ 



To compute /(Y|5j = 1, a, A, jy, e, first note that from (|3| and the 

Markovian property of the INAR(l) model the outlier at time j affects the 
model for t = j and t — j + 1. Then, 



f(y\^3 = l>a>-A>»7ie>^(-i)) =f{Xj,Xj+i\Xj^i,5j = 1, a. A, r?, e, 

=f{Xj\Xj^i,5j = l,a, A,r7,e,^(_j)) 
X f{Xj+i\X.j,5j = 1,Q!, A,r7,e,(5(_j)) 

(10) 

with = e-^ E.=o (0^ Cf'-^ (1 - a)^'-"' and M, = 

min(Xt_i, Xt) as before. Moreover, if Sj = 1 then Xj =Yj— Therefore 
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fi^j\^j-i^^] = A,r7,e,5(_j)) =P(aoXj_i + =Xj\Xj^i,5j = 1,...) 

=P{a o + e^- = y,- - rjj\Xj^i,5j = 1 

Af 

=C 



(r,-,7,-z)! 
(11) 

and 

f{Xj+i\Xj,Sj = A, ?7, e, <5(_j)) ^P{aoXj + e^+i =Xj+i|Xj,(5j = 1, a, A, 77^,6) 

(12) 

with Mj* = mm(Ft - r]t, Xt+i). 

Similarly, if 5j = then Xj — Yj and therefore 

/(Y|(5, =0,a,A,r,,e,^(_,))) =0"^^ 

ni:cf-a-(i-a)---^ 

(13) 

To derive the conditional posterior distribution of 77 note that if 6j = 0, 
no outlier at t = j, there is no information about rjj except the prior. Then 
r]j\(Y, A, a, e, 5j — 0, 'n(-j)) ^ Po{P)- However, if 5j = \ Y contains information 
about rjj. Therefore, 

7r(77j |Y,A,a,e,(5j = l,»7(_j)) = 

7r(?7j|A, a, e, = 1)/(Y|A, a, e, (5^ = 1,77^) 



E^=o'^('7il^>a,e>^j = l)/(Y|A,a,e,(5j = 1,77^-) 



77, =0,1,2,... 



1, a, A, e), 
(14) 



with f{Xj,Xj^i \rij,Sj = l,a,A,e) as given in (lOl, (111 and (12) and r] 



denoting the vector rj with the jth component deleted. 

Finally, the conditional posterior distribution for e depends only on d. Since 
the prior distribution of e is Be{h, g) the conditional posterior is given by 



e| Y, \r),5 = e\5 ^ Be{h + k,g + n-l-k) 
where k is the estimated number of outliers (number of (5j's=l). 



(15) 
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2.2 Computational Issues 



We may use the full conditional distributions of a,X, d = {S2, ■ ■ ■ , Sn), r] = 
(r]2, . . . ,r]n) and e to draw a sample of a Markov chain which converges to the 
joint posterior distribution of the parameters. In most cases we can not gener- 
ate directly from the full conditionals. Since they are not log-concave densities 
we use Gibbs methodology within Metropolis step. In particular the Adap- 



tive Rejection Metropolis sampling - ARMS (Gilks et al. 1995) - is used inside 



the Gibbs sampler. When the number of iterations is sufficiently large, the 
Gibbs draw can be regarded as a sample from the joint posterior distribution. 
Accordingly there are two key issues in the successful implementation of this 
methodology: deciding the length of the chain and the burn-in period and es- 
tablishing the convergence of the chain. We use a burn-in period of M iterations 
and then iterate the Gibbs sampler for a further N iterations but retain only 
each Lth value. This thinning strategy reduces the autocorrelation within the 
chain. 

Once the posterior probability of outlier occurrence at each time point, 
Pj — P{Sj — 1|Y, a, A, ?7, e, (5(_j)) is estimated a cut-off point of 0.5 is used 
for detecting outliers, i.e. there is a possible outlier when pj > 0.5. 

We now discuss the other relevant issue in the proposed bayesian approach: 
the choice of the hyperparameters for prior distributions. Recall from the pre- 
vious section that a ~ Be(a,6), A ~ Ga(c, d). We set a = & = c = d = 0.001 
to use non informative prior distributions (Beta and Gamma distributions with 
large variability). For the prior for e ~ Be{h,g) we choose h = 5, g = 95 so 
that E{e) = 0.05 to reflect the prior belief that outliers occur occasionally with 
probability 0.05 for any time point. Regarding the parameter /3 of the prior 
distribution for the size of the outlier at time t, r]t ^ P{P) two approaches are 
pursued: an informative setup in which Pinfo is set equal to three times the stan- 
dard deviation of the 1-step-ahead prediction error and also a non-informative 
setup with Pninfo — 30 to reflect large variability. 



3 Illustration 

In this section we illustrate the performance of the above procedure with simu- 
lated data sets of 100 observations and the IP data example of section [T] 

3.1 Simulated data sets 

We consider time series simulated from several INAR(l) processes with a = 
0.15,0.5,0.85 and A = 1,3,5 with one and three outliers of different sizes 77 of 
order equal to three, five and seven times the standard deviation. The times of 
the outliers are generated randomly. 

The Gibbs sampler used to obtain the Bayesian estimates is iterated M+N = 
5000 times and the L — 5th value of the last N — 2500 iterations is kept, 
providing sample sizes of 500 values from which the estimates are computed as 
the sample means. 
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The results are reported for Pninfo — 30 since they do not differ from those 
obtained with Pmfo- 

The results are illustrated in figure |2] with simulated data from the model 
with parameters a = 0.85, A = 1, outliers at i = 9, 29, 75 with sizes f] = 7, 13, 18, 
respectively. Figure [2] represents the time series and the posterior probability 
of outlier occurrence for each time point, pt- The Gibbs sampling successfully 
detects the outliers with estimated size of f/g = 7, 7)29 — 12 and 7775 — 19. 




(b) 
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Figure 2: (a) Simulated data with a = 0.85, A = 1, outlier at times t = 
9, 29, 75 with sizes rj = 7, 13, 18, respectively; (b) posterior probability of outlier 
occurrence at each time point, fi — 7, 12, 19, respectively. 



The results for all the simulated models are summarized in table [T] for series 
contaminated with 3 outliers. The table contains: the parameters a and A used 
to generate the series with outliers of size 77s at times S, estimates for the pa- 
rameters a and A obtained by conditional least squares (assuming no outlier), 
Initial CLS and by Gibbs sampling. Final Bayes, the estimated probability of 
outlier occurrence. Probability, and the estimated outlier size for all the time 
points for which that probability is over the threshold 0.5, Final Bayes. For 
comparison purposes the table also presents the CLS estimates for the parame- 
ters a and A after removing the effect of the detected outliers. Final CLS. The 
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results presented in table [T] indicate that the procedure is able to detect additive 
outliers in INAR(l) models. For models with small variability (a and A small) 
small outliers are more difficult to detect. This is illustrated for an INAR(l) 
model with parameters a = 0.15, A = 1 and outliers of size 7759 = 5 and7734 = 7 
which are not detected. For models with larger variability the outliers are cor- 
rectly detected even when their size is small (see figure 2). Moreover, the results 
in table [T] illustrate the negative impact of the outliers on the estimates of a 
and A {Initial CLS). It is worthwhile noting that the estimates obtained from 
Gibbs sampling {Final Gibbs) and the conditional estimates obtained removing 
the effect of the detected outliers {Final CLS) are, in general, similar. However, 
for small a and for these particular simulated series, the Bayesian estimates are 
biased which is a typical behaviour for this range of a values ( Silva et al. 2005 1. 



3.2 IP data example 

Let us consider once again the motivating example of section ^ regarding the 
number of different IP addresses accessing the server of the Department of Statis- 
tics of the University of Wiirzburg on November 29th, 2005, between 10a.m. and 
6p.m., represented in figure [T] ( jWeifi 20071. The sample mean and variance of 
the series are x — 1.32,(7^ = 1.39. The autocorrelation and partial autocorrela- 
tion functions indicate that a model of order one is appropriate. CLS estimates 
for a and A are a — 0.22 and A — 1.03, respectively. The result of applying the 
proposed methodology is represented in figure [3]Jb) indicating the possible oc- 
currence of an outlier at time t — 224. The estimated size of the outlier is i) — 7. 
It is interesting to note hat setting the time of the outlier to t = 224 and using 
the results from Barczy et al. ( |2011j ) the CLS estimate for rj is fjcLS = 6.73. 
Removing the effect of the outlier a.t t = 224 the mean and variance of the 
resulting series are 1.29 and 1.2, respectively. The autocorrelation and partial 
autocorrelation functions still indicate that a model of order one is appropriate. 
CLS estimates for the parameters are now acLS — 0.29 and Xcls — 0.91 in 
accordance with the estimates obtained from the Gibbs sampling, asayes — 0.27 
and Xsayes = 0.89, whose posterior distribution is represented in figure [4] 



4 Concluding remarks 

In this paper, the Gibbs sampling for detecting additive outliers in Poisson 
INAR(l) time series is presented. We estimate the probability that an observa- 
tion is affected by an outlier. This procedure has the advantage of identifying 
observations that may require further scrutinizing. Note that the hyperparam- 
eters of the prior distributions of the outlier size and outlier occurrence proba- 
bility, P and e, respectively, are fixed but the same methodology applies if they 
are time dependent, /3t and et, say. Masking and swamping effects caused by 
patches of outliers may occur depending on the size and relative position of the 
outliers within the patch. The solution of this problem is being investigated. 
The extension of this methodology to models of higher-order, INAR(p) p > I, 
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is possible. The mathematical expressions are easily derived from the likelihood 
function. However, the later and consequently the full conditional posterior dis- 
tributions are highly complex. Therefore, the implementation of the methodol- 
ogy for higher order models requires additional computing effort. Since appli- 
cations of higher-order INAR models are scarce in the literature this extension 
has not been considered in this work. 
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Estimates 



Parameter 


True 


Initial 


Final 


Probability 






CLS 


Bayes 


CLS 




a 


0.15 


0.14 


0.07 


0.17 




A 


1 


1.20 


1.27 


1.05 




mi 


7 




- 




0.15 


V50 


5 




- 




0.07 


V63 


9 




9 




0.87 


a 


0.15 


0.09 


0.01 


0.03 




A 


3 


3.47 


3.40 


3.40 




1134 


9 




11 




0.99 


V50 


13 




13 




0.99 


ms 


6 




- 




0.12 


a 


0.15 


0.04 


0.32 


0.15 




A 


5 


6.35 


4.0 


5.1 






7 




- 




0.09 


V50 


12 




13 




0.96 


ms 


16 




18 




0.99 


a 


0.5 


0.22 


0.41 


0.37 




A 


1 


1.04 


0.94 


1.05 




V9 


10 




11 




0.90 


lh>7 


4 
~ 
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0.01 
0.81 


a 


0.5 


0.23 


0.59 


0.57 




A 


3 


4.72 


2.28 


2.39 




V9 


17 




19 




0.99 


mi 


12 




16 




0.99 


V97 


7 




10 




0.99 


a 


0.5 


0.26 


0.51 


0.57 




A 


5 


7.04 


4.30 


3.87 




V9 


10 




14 




0.91 


mi 


21 




22 




0.99 


V97 


15 




17 




0.99 


a 


0.85 


0.37 


0.86 


0.80 




A 


3 


1.27 


2.62 


3.90 




V9 


31 




29 




0.92 


m9 


13 




10 




0.99 


V75 


22 




22 




0.99 


a 


0.85 


0.46 


0.85 


0.85 




A 


5 


17.55 


4.60 


4.66 




ms 


40 




37 




0.92 




28 




27 




0.99 


V78 


17 




20 




0.99 



Table 1: Results from Gibbs sampling in simulated INAR(l) time series with 
parameters a and A, three outliers each of size r]s at time S. 
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(a) 




Figure 3: Number of different IP adresses accessing the server of the pages of 
the Department of Statistics of the University of Wiirzburg (a) and posterior 
probabihty of outHer occurrence at time t for the IP data set (b). 



13 



(a) (b) 




0.20 0.25 0.30 0.35 0.70 0.80 0.90 1.00 

a X 



Figure 4: Posterior distribution of a and A. The dotted lines represent the 
estimates asayes = 0.27 and Xsayes = 0.89. 
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